Tools.h++ Overview

Written by Thomas Keffer, PhD, Rogue Wave Software, Inc.

© Copyright Rogue Wave Software, Inc. 1989-1993.

Table of Contents

History of Tools.h++

Rogue Wave Software's Tools.h++ has its roots in the Data Analysis and Interactive Modeling Software (DAIMS) project at the University of Washington. The intention of the project was to develop reusable mathematical modeling tools for use in fluid dynamics. Bruce Eckel and myself evaluated a lot of languages, and in 1987, settled on C++, then in comparative infancy. We wrote a set of foundation classes which later diverged into two distinct libraries. The more mathematically-oriented structures became Math.h++, the first commercially offered C++ library. The fundamental data structures, after feedback from various compiler manufacturers and the growing C++ community, became Tools.h++. One of the older C++ class libraries around, it is now in its sixth version and consists of about 40,000 lines of code (not counting test suites).

Design Philosophy

The C++ language has several design goals that set it apart from most other object-oriented languages. The first and foremost is efficiency: it is possible to write production quality code that is every bit as efficient and small as code that has been written in C, yet more maintainable. A second is a "less is more" philosophy: no feature has been included in the language that will make non-users of the feature suffer. For example, you will not find built-in garbage collection. The result is a skeletal, lean-and-mean language (at least as far as object-oriented languages go) that compiles fast and results in small and efficient code.

Successful libraries play to the strengths of the language they are written in and do not try to mask them. It follows that the fundamental design goal of a library written in C++ should be runtime efficiency. This was the fundamental goal of Tools.h++. We did not want to write a prototyping tool that would be fun to play with, but got pushed aside when the "real" coding began. Tools.h++ was written with an eye towards "production quality" code.

In general, there are no features that will slow things down for the non-user of the feature. As many decisions as possible are made at compile time, consistent with the C++ philosophy of static type checking. In most cases, Tools.h++ offers a choice between classes with extreme simplicity, but little generality, and classes that are a little more complex, but more general. We have chosen not to require that all classes inherit a secular base class (such as the class Object used by Smalltalk and The NIH Classes) which would require large amounts of unused code to be dragged in.

Another design goal was to adhere to "The Principle of Least Astonishment" for predictability and ease of learning. Many new users of C++ become so giddy with the power of being able to overload esoteric operators like "&=" that they forget about tried-and-true function calls and start redefining everything in sight. Tools.h++ tries to avoid all this. All of the familiar operators work just as you might expect - there are no surprises. We also wanted to write an API that would protect the programmer from large classes of mistakes, by taking advantage of the strong type checking that C++ offers.

Various language "tricks" were kept to a minimum. The goal was fairly generic code that would compile reliably and uneventfully on a wide variety of platforms. Exotic language features such as virtual base classes, with their unreliable implementations, were not used.

Organization

What does the library look like?

Tools.h++ provides implementation, not policy. Hence, it consists mostly of a large and rich set of concrete classes that are usable in isolation and do not depend on other classes for their implementation or semantics: they can be pulled out and used just one or two at a time. The concrete classes consist of a set of simple classes (such as dates, times, strings, etc.) and three different families of collection classes:

Regardless of their implementation, all collection classes generally follow the Smalltalk abstractions for collection classes: SortedCollection, Dictionaries, Bags, Sets, etc., and use similar interfaces, allowing for them to be interchanged easily.

The library also includes a set of abstract data types (ADTs), and corresponding specializing classes, that provide a framework for persistence, localization, and other issues, although this is not the central focus of the library. It has been our experience that extensive use of ADTs requires a commitment on the part of the programmer to follow their conventions from the beginning, thus intruding on the overall design.

In the following sections we first discuss the various concrete classes offered by Tools.h++, then its various abstraction facilities. This is followed by a look at various implementation issues including implementation conventions, error handling, and dynamic link libraries.

Simple Classes

Tools.h++ provides a rich set of lightweight simple classes. By "lightweight" we mean classes with low-cost initializers and copy constructors. Examples include RWDate (dates, following the Gregorian calendar), RWTime (times, including support for various time zones and locales), RWCString (single- and multi-byte strings), RWWString (wide character strings), and RWCRegexp (regular expressions). Most of these classes are four bytes or less, with very simple copy constructors (usually just a bit copy) and no virtual functions.

It is worth looking at the string classes in more detail to see how the design objectives were achieved. The goal is to be fast enough that the programmer would not be tempted to go back to strcpy() and the like. We also wanted very simple value-semantics that would be easy to understand. Both goals were achieved by using copy-on-write and reference counting. Here's a schematic look at class RWCString, used for single and multi-byte strings (omitting many details critical for performance and robustness in multi-threaded environments):

class RWCStringRef
{
  // All constructors are private:
  RWCStringRef(const char* cstr);
  // NB: deep copy:
  RWCStringRef(const RWCStringRef&);
  ~RWCStringRef;
  void append(const char*);		// Append to self
  .
  .	// etc.
  .
  unsigned short	refs_;		// Reference count
  unsigned		nchars_;	// String length
  unsigned		npts_;		// Length of array_
  char*			array_;		// Array of data
friend class RWCString;
};
class RWCString
{
public:
  RWCString();		// Null string
  RWCString(const char * a)
  { pref_ = new RWCStringRef(a); }
  RWCString(const RWCString& S)
  { pref_ = S.pref_; ++pref_->refs_; }
  ~RWCString()
  { if (--pref_->refs_ == 0) delete pref_; }
  RWCString& append(const char* cstr)
  { cow();
    pref_->append(cstr);
    return *this; }
  .
  .	// etc.
  .
protected:
  void cow()
  { if (pref_->refs_ > 1) clone(); }
  void clone()
  { // NB: Deep copy of representation:
    RWCStringRef* temp = new RWCStringRef(*pref_);
    if (--pref_->refs_ == 0) delete pref_;
    pref_ = temp; }
private:
  RWCStringRef* pref_;
};

The copy constructor of RWCString merely increments a reference count rather than making a whole new copy of the string data. This makes readonly copies very inexpensive. A true copy is made only just before an object is to be modified.

This also allows all null strings to share the same data, making initialization of arrays of RWCStrings fast and efficient:

// Global null string representation, shared by
// all strings:
RWCStringRef* nullStringRef = 0;
// Default constructor becomes inexpensive, with
// no memory allocations:
RWCString::RWCString()
{
  if (nullStringRef==0) nullStringRef = new RWCStringRef("");
  pref_ = nullStringRef;
  pref_->refs_++;
}

Version 6 also includes a wide character string class with facilities for converting to and from multi-byte character strings:

class RWWString
{
public:
  enum widenFrom {ascii, multiByte};
  RWWString(const wchar_t * a);
  RWWString(const char* s,      widenFrom codeset);
  RWWString(const RWCString& s, widenFrom codeset);
  RWBoolean isAscii() const;		// Nothing but ASCII?
  RWCString toAscii() const;		// strip high bytes
  RWCString toMultiByte() const;	// use wcstombs()
  .
  .
  .
};

Ordinarily, conversion from multibyte to wide character string is performed using mbstowcs(). The reverse conversion is performed using wcstombs(). If through other information the character string is known to be ASCII (that is, no multibyte characters and no characters with their high-order bit set) then optimizations may be possible by using the enum "widenFrom" and function toAscii().

Templates

Three different kinds of template-based collection classes are included in Tools.h++:

All collection classes have a corresponding iterator. Multiple iterators can be active on the same collection at the same time.

Using the template-based collection classes can result in a big performance win because so much is known at compile time. For example, sorted collection classes can do a binary sort with direct comparison of objects without resorting to function calls:

template <class T> int
RWTValSortedVector<T>::bsearch(const T& key) const
{
  if (entries())
  {
    int top = entries() - 1, bottom = 0, idx;
    while (top>=bottom)
    {
      idx = (top+bottom) >> 1;
      if (key == (*this)(idx))	// Direct, possibly inlined, comparison
	return idx;
      else if (key < (*this)(idx))
        top    = idx - 1;
      else
        bottom = idx + 1;
    }
  }
  // Not found:
  return -1;
}

This can result in extremely fast searches. It is worth mentioning that the original version of this code used an external class to define the comparison semantics. This resulted in much slower code because of the inability of present compilers to optimize out the thicket of inline functions. Hence, we decided to use direct object comparisons. As compilers become better at optimizing, this decision will be revisited. This is an example of our preference for the pragmatic, if theoretically less elegant, in the search for good runtime performance.

Generic Collection Classes

Generic collection classes are very similar to templates in concept, except that they use the C++ preprocessor and the header file <generic.h> as their instantiation mechanism. While they are definitely crude, they do have certain advantages until the widespread adoption of templates. For example, they have the same type safe interface:

declare(RWGOrderedVector,RWCString)	// Ordered collection of strings
implement(RWGOrderedVector,RWCString)
RWGOrderedVector(RWCString) vec;
RWCString a("a");
vec.insert(a);		// OK
vec.insert("b");	// Type conversion occurs
RWDate today;
vec.insert(today);	// Rejected!

Because their interface and properties are very similar to templates, generic collection classes can be an important porting tool to support compilers and platforms that may not have templates.

Smalltalk Classes

Tools.h++ also includes a comprehensive set of Smalltalk-like collection classes (e.g., Bag, Set, OrderedCollection, etc.). By "Smalltalk-like" we mean classes that are rooted in a single class, in this case RWCollectable. Because RWCollectable classes can make use of the isomorphic persistence machinery of Tools.h++, all Smalltalk-like classes are fully persistent (see below).

With the widespread adoption of templates by many compilers, these classes will undoubtedly become less important in the future. Nevertheless, they will still offer a number of advantages over templates. For example, heterogeneous Smalltalk-like collections can take advantage of code reuse through polymorphism.

Persistence

The previous sections offered an overview of the concrete classes in Tools.h++. With this section we begin discussing some of the abstractions offered by the library.

All objects that inherit from RWCollectable can enjoy isomorphic persistence: the ability to save and restore not only objects, but also their interrelationships, including multiple references to the same object.

Persistence is done to and from virtual streams RWvostream (for output) and RWvistream (for input), abstract data types that define an interface for storing primitives and vectors of primitives:

class RWvostream 
{
public:
  virtual RWvostream&	operator<<(char) = 0;
  virtual RWvostream&	operator<<(double) = 0;
  virtual RWvostream&	put(const char* p, unsigned N) = 0;
  virtual RWvostream&	put(const double* p, unsigned N) = 0;
  // etc. ...
};
class RWvistream
{
  virtual RWvistream&	operator>>(char&) = 0;;
  virtual RWvistream&	operator>>(double&) = 0;
  virtual RWvistream&	get(char*, unsigned N) = 0;
  virtual RWvistream&	get(double*, unsigned N) = 0;
  // etc. ...
};

Clients are freed from concern for not only the source and sink of bytes, but also with the formatting they will use. Two types of specializing virtual streams are supplied: classes RWpostream and RWpistream (formatting in a portable ASCII format), and RWbostream and RWbistream (binary formatting). ASCII formatting offers the advantage of portability between operating systems while binary formatting is typically slightly more efficient in space and time. Users can easily develop other specializing classes. For example, SunPro has developed versions of RWvostream and RWvistream for XDR formatting.

The actual source and sink of bytes is set by a streambuf, such as filebuf or strstream, normally supplied by the compiler manufacturer. Tools.h++ also includes two specializing streambufs for Microsoft Windows users: RWCLIPstreambuf for persisting to the Windows Clipboard, and RWDDEstreambuf for persisting between applications using DDE (dynamic data exchange). The latter can also be used to implement Object Linking and Embedding (OLE) features.

Internationalization

Version 6 of Tools.h++ introduces very powerful facilities for internationalization and localization. Central to these is the RWLocale class. This is an abstract base class with an interface designed for formatting and parsing things such as numbers, dates, and times:

class RWLocale {
public:
  virtual RWCString asString(long) const = 0;
  virtual RWCString asString(struct tm* tmbuf, char format,
                             const RWZone& = RWZone::local()) const = 0;
  virtual RWBoolean stringToNum (const RWCString&, long*) const = 0;
  virtual RWBoolean stringToDate(const RWCString&, struct tm*) const = 0;
  virtual RWBoolean stringToTime(const RWCString&, struct tm*) const = 0;
  // returns [1..12] (1 for January), 0 for error
  virtual int monthIndex(const RWCString&) const = 0;
  // returns 1 for Monday equivalent, 7 for Sunday, 0 for error.
  virtual int weekdayIndex(const RWCString&) const = 0;
  // the default locale for most functions:
  static const RWLocale& global();
  // A function to set it:
  static const RWLocale* global(const RWLocale*);
  // etc. ...
};

Two specializing versions of RWLocale are supplied: a lightweight "default" version with English names and strftime() formatting conventions, and a version that queries the Standard C locale facilities. The user can easily add other specializing versions that (for example) consult a data base.

The RWLocale abstract interface is used by the various concrete classes to format information:

class RWDate {
public:
  RWCString asString(char format = 'x',	// 3/12/93 style formatting
                     const RWLocale& locale = RWLocale::global()) const
  {
    struct tm tmbuf;
    extract(&tmbuf);  // Convert date to struct tm
    return locale.asString(&tmbuf, format);
  }
  .
  .
  .
};

Here, the member function asString() converts a date into a string, using the formatting information supplied by the locale argument. The global "default locale" is used by binary operators such as the l-shift operator:

ostream& operator<<(ostream& s, const RWDate& d)
{
  s << d.asString();
  return s;
}

Design and Implementation Conventions

Tools.h++ (as well as the other Rogue Wave libraries) uses a number of design and implementation conventions that make it easier for the user to understand the library.

Information is generally thought of as flowing into a function via its arguments, and then back out via its return value. Functions generally do not change their arguments and so all formal parameters are either passed by value or as a constant reference.

There are two important exceptions. The first is a collection class or other class that must maintain a relationship with another object. In this case, a pointer to the other object is passed, never a reference. This is to remind the user that an interrelationship has been established. Hence:

RWTPtrOrderedVector<T>::insert(T*);
RWModel::addDependent(RWModelClient* client);

and not:

RWTPtrOrderedVector<T>::insert(T&);
RWModel::addDependent(RWModelClient&);

With the latter, it is all too easy to think that the argument is being passed in by value, forgetting that a reference to the argument will be retained. Passing in pointers also discourages passing in a stack based argument, because the address of the argument would have to be taken:

RWTPtrOrderedVector<RWCString> collect;
// Insert stack-based variable:
RWCString s("a string");
collect.insert(&s);	// Looks weird; should ring bells
// Proper idiom:
collect.insert(new RWCString("a string"));

The second exception is objects that require initialization with another object in order to work. In C++, a reference represents a real object. It cannot be nil. Hence, it can be useful in the constructor of an object that requires a "partner" object in order to function. Iterators are an example:

RWOrderedIterator(RWOrdered& ord);

Naming Conventions

Tools.h++ takes great care not to pollute the global name space. All global class, variable, and macro names are prefixed with the letters "RW". This makes it easy to work with other libraries, including such libraries as the X Window System, which use generic names such as "Object" or "Boolean".

All function names start with a lower-case letter, but subsequent words are capitalized. Generally, abbreviations are not used.

Where appropriate, all classes use the same member function name. For example, the number of items in a collection class is always returned by member function "entries()".

Errors

Errors are all too common in coding, yet little attention has been paid to them in the literature. Tools.h++ uses an error model that divides errors up into one of four different categories:

Coding errors

The distinguishing characteristic of coding errors is that their cost of detection exceeds the cost of the operation contemplated. A good example is bounds errors: the cost of checking to make sure an index is in range can exceed the cost of the array access itself. Hence, good performance demands that the library, at least in a production version, not check indices for validity. This will require some minimal level of correction on the part of the user's program. Anything that falls short is a coding error.

Obviously, to be so classified, such errors must be straightforward, easy to detect, and occur at a relatively low-level. Otherwise, it will be extremely difficult for the user to achieve the goal of eliminating all coding errors.

With Tools.h++, coding errors are discovered and eliminated by compiling the library in a "debug" version, which typically activates a set of PRECONDITION and POSTCONDITION clauses2. For example, the debug version includes bounds checking on all array accesses. If the debug version of the library discovers an error, it typically aborts the program.

Runtime errors

The distinguishing characteristic of runtime errors is that they cannot reasonably be predicted in advance. An example is using a bad date (e.g., 31 June 1992) to initialize a date object.

The line between a coding error and a runtime error can sometimes be fuzzy. Attempting to set a date object to an invalid date could be regarded as a violated precondition, but this would result in a less than useful library as the date object is probably in a better position to make this judgment than the user.

The response to a runtime error is either to throw an exception or to provide a test for object validity. The program is never aborted.

Range errors

Range errors are similar to runtime errors, but the distinguishing characteristic is that they involve a unary or binary operator where there is no opportunity to return a "status value". An example is an arithmetic overflow from legal arguments.

Because a "status value" is not possible, the response is always to throw an exception.

No range errors are possible in Tools.h++ (they are primarily an issue in Rogue Wave's math libraries).

Acts of God

Or, more formally, failures in abstraction. Two example are:

These sorts of errors always occur at a highly abstract level and cannot reasonably be predicted in advance. The response is always to throw an exception.

Dynamically Linked Library

Tools.h++ is unusual among foundation classes for being highly DOS and Microsoft Windows aware, yet highly portable.

For example, it can be built as a Microsoft Windows Dynamically Linked Library (DLL), allowing it to be shared between multiple applications, reducing both disk and memory usage.

Allowing a library to be compiled as a DLL is not particularly difficult, but it does require attention to detail. By far the most difficult task is dealing with any global data: a Windows DLL has only one data segment and it is shared between all applications. This means that if one application were to change the global data, all other tasks would see the change. The first defense is to eliminate any global data and, indeed, Tools.h++ has only minimal such data. The remaining data must be managed by an "instance manager" with the job of retrieving the correct piece of global data depending on which application is active, using the task ID as a key. If no data exists for a task, then the manager initializes a new instance.

The only problem comes when the task exits: the DLL should reclaim any instance data and return it to the operating system. This requires patching into the Windows exit procedure code to detect when the task exits.

All of this is handled by a small auxiliary DLL called RWTSD (Task Specific Data). This DLL is available to the programmer to implement other DLLs and can be used independently of Tools.h++.

In Closing

This article has emphasized the design and architecture of Tools.h++. There is a danger of thinking that this what library design is all about. Less sexy, but equally important, is what goes on behind the scenes: test suites, porting, installation scripts, dealing with compiler bugs and quirky operating systems, etc. All of these issues, and many more, must be addressed to deliver a robust, versatile, and portable library.


© Copyright 1995, Rogue Wave Software, Inc.